Feature sparse linalg solvers by abagusetty · Pull Request #2841 · IntelPython/dpnp

abagusetty · 2026-04-09T03:38:05Z

Adds support for from dpnp.scipy.sparse.linalg import LinearOperator, cg, gmres, minres
Fixes: #2831

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
Have you checked performance impact of proposed changes?
Have you added documentation for your changes, if necessary?
Have you added your changes to the changelog?

…oneMKL hooks - _interface.py: add full operator algebra (.H, .T, +, *, **, neg), _AdjointLinearOperator, _TransposedLinearOperator, _SumLinearOperator, _ProductLinearOperator, _ScaledLinearOperator, _PowerLinearOperator, IdentityOperator, MatrixLinearOperator, _AdjointMatrixOperator, _CustomLinearOperator factory dispatch; extend aslinearoperator to handle dpnp sparse and dense arrays - _iterative.py: add _make_system (dtype validation, preconditioner wiring, working dtype selection); add _make_fast_matvec CSR/oneMKL SpMV hook; fix GMRES Arnoldi inner product to single oneMKL BLAS gemv (dpnp.dot) instead of slow Python vdot loop; offload Hessenberg lstsq to numpy.linalg.lstsq (CPU, matches CuPy); fix SciPy host-fallback tol->rtol deprecation via _scipy_tol_kwarg; add preconditioner support to CG; keep MINRES as SciPy-backed stub Refs: CuPy v14.0.1 cupyx/scipy/sparse/linalg/_interface.py, cupyx/scipy/sparse/linalg/_iterative.py"

…gmres, minres Modeled after CuPy's cupyx_tests/scipy_tests/sparse_tests/test_linalg.py. Covers: - LinearOperator: shape, dtype inference, matvec/rmatvec/matmat, subclassing, __matmul__, __call__, edge cases - aslinearoperator: dense array, duck-type, identity passthrough, rmatvec from dense, invalid inputs - cg: SPD convergence, scipy reference match, x0 warm start, b_ndim=2, callback, atol, LinearOperator path, invalid inputs, non-convergence info check - gmres: diag-dominant convergence, scipy reference match, restart variants, x0, b_ndim=2, callbacks, complex systems, atol, non-convergence info check, Hilbert-matrix stress test - minres: SPD, symmetric-indefinite, scipy reference, shift parameter, non-square guard, LinearOperator path, callback - Integration: parametric (n, dtype) cross-solver tests via LinearOperator - Import smoke tests: __all__ completeness

- Use dpnp.tests.helper: assert_dtype_allclose, generate_random_numpy_array, get_all_dtypes, get_float_complex_dtypes, has_support_aspect64 - Use dpnp.tests.third_party.cupy testing harness (with_requires, etc.) - Use numpy.testing assert_allclose / assert_array_equal / assert_raises - Use dpnp.asnumpy() instead of numpy.asarray() - Use pytest parametrize ids matching existing test conventions - Use is_scipy_available() helper from tests/helper.py - Strict class-per-solver organisation matching TestCholesky / TestDet etc.

…or dtype Two bugs fixed: 1. _init_dtype() was calling dpnp.zeros(n) which defaults to float64, so a float32 matvec would upcast and return float64, making the inferred dtype wrong. Fix: use dpnp.zeros(n, dtype=dpnp.int8) as SciPy/CuPy do — any numeric matvec will promote int8 to its own dtype. 2. _CustomLinearOperator.__init__ called _init_dtype() even when an explicit dtype was already supplied, overwriting the caller's value. Fix: _init_dtype() now short-circuits when self.dtype is already set.

…iterative.py

…ption handling Align gemv.cpp with the conventions established in blas/gemm.cpp: Headers added: - ext/common.hpp (dpctl_td_ns, consistent with other extensions) - utils/memory_overlap.hpp (MemoryOverlap guard on x vs y) - utils/output_validation.hpp (CheckWritable + AmpleMemory on y) - utils/type_utils.hpp (validate_type_for_device<T> in impl) - <sstream> (needed for stringstream error_msg) Exception handling added in sparse_gemv_impl(): - try/catch(oneapi::mkl::exception) around all oneMKL sparse calls - try/catch(sycl::exception) around all oneMKL sparse calls - release_matrix_handle cleanup in the exception error path - throw std::runtime_error with descriptive message on catch Input validation added in sparse_gemv(): - ndim checks: x and y must be 1-D - queues_are_compatible() across all 5 USM arrays - MemoryOverlap()(x, y) aliasing guard - CheckWritable::throw_if_not_writable(y) - AmpleMemory::throw_if_not_ample(y, num_rows) - keep_args_alive() at function exit (was missing, returning empty event)

… table Modeled after blas/gemm.cpp (2-D table: value type x index type) and blas/gemv.cpp (dispatch vector pattern with ContigFactory + init_dispatch_table). Changes: - Add sparse/types_matrix.hpp with SparseGemvTypePairSupportFactory<Tv, Ti> encoding the 4 supported combinations: {float32,float64} x {int32,int64} - Rewrite sparse_gemv_impl() to take typeless char* pointers (matching the blas gemv_impl signature style) — type info flows through template params only, no runtime branching inside the impl - Replace the 60-line if/else val_typenum/idx_typenum chain in sparse_gemv() with a 2-D dispatch table lookup (gemv_dispatch_table[val_id][idx_id]) - Rename init_sparse_gemv_dispatch_vector -> init_sparse_gemv_dispatch_table and implement it via init_dispatch_table<> from ext/common.hpp - All validation guards and exception handling from prior commit are preserved

…se_gemv_dispatch_table Follows the rename made in gemv.cpp when the dispatch mechanism was changed from a 1-D vector to a 2-D table (value type x index type). All other declarations (sparse_gemv signature, parameters) are unchanged.

The oneMKL 2025-2 sparse BLAS API deprecated the old 8-argument set_csr_data(queue, handle, nrows, ncols, index_base, row_ptr, col_ind, values, deps) overload in favour of a new signature that takes the sparse matrix handle as `spmat` and adds an explicit `nnz` argument: set_csr_data(queue, spmat, nrows, ncols, nnz, index_base, row_ptr, col_ind, values, deps) Fixes: - Replace old set_csr_data call with the new nnz-aware signature - Silences the resulting -Wunused-parameter warning on `nnz` (now used) - No functional change; all other logic is unchanged

…tring Line 477: `hasattr(A, "rmatmat\")` had a Markdown-escaped backslash leaked into the Python source, causing an unterminated string literal. Fixed to `hasattr(A, "rmatmat")`.

dpnp.ndarray blocks implicit NumPy conversion via __array__ to prevent silent dtype=object arrays. All test assertions must use .asnumpy() to materialize device arrays onto the host explicitly. Also replaces numpy.asarray(x_dp) in _rel_residual helper.

…dation order - _iterative.py: raise NotImplementedError for M != None *before* the _HOST_N_THRESHOLD SciPy fast-path in cg() and gmres(), so the contract is enforced regardless of system size (fixes test_cg_preconditioner_unsupported_raises, test_gmres_preconditioner_unsupported_raises). - _iterative.py: validate callback_type and raise NotImplementedError for 'pr_norm' *before* the _HOST_N_THRESHOLD branch in gmres(), so small-n systems also see the error (fixes test_gmres_callback_type_pr_norm_raises). - _iterative.py: pass callback_type='legacy' to scipy.sparse.linalg.gmres when delegating on the fast path to suppress SciPy DeprecationWarning. - test_scipy_sparse_linalg.py: add dtype=numpy.float64 to expected arange() calls in test_identity_operator and test_gmres_happy_breakdown so strict NumPy 2.0 dtype-equality checks pass (float64 result vs int64 expected).

… port SciPy corner cases

…ecurrence

- Replace .asnumpy() method calls with dpnp.asnumpy() module fn (asnumpy is not an ndarray method in dpnp; it is a top-level fn) - Fix dpnp.any(x) ambiguous truth value in x0 zero-check; replace with explicit `x0 is not None` guard for r0 initialisation - Fix V_mat.T.conj() -> dpnp.conj(V_mat.T) in GMRES Arnoldi step - Guard minres beta sqrt against tiny negative floats: sqrt(abs(...)) - Unify GMRES Hessenberg h_np assignment to avoid .real stripping producing wrong dtype for complex systems - Fix float() cast on dpnp scalar norm inside GMRES inner h_j1 line

…failures) The committed code used hypot(gbar, oldb) as delta_k which is the gamma (norm) from the PREVIOUS rotation step, not the correct diagonal entry from applying the previous Givens rotation to the current column. The correct Paige-Saunders (1975) two-rotation recurrence is: oldeps = epsln delta = cs * dbar + sn * alpha # apply previous rotation gbar_k = sn * dbar - cs * alpha # residual -> new rotation input epsln = sn * beta dbar = -cs * beta gamma = hypot(gbar_k, beta) # NEW rotation eliminates beta cs = gbar_k / gamma sn = beta / gamma w_new = (v - oldeps*w - delta*w2) / gamma # three-term update This matches scipy.sparse.linalg.minres and Choi (2006) eq. 6.11. The buggy recurrence produced solutions ~1.08x away from the true solution (rel_err ~1e0) instead of the expected ~1e-13. Co-authored-by: fix-minres-recurrence

antonwolfy · 2026-05-11T22:49:08Z

+
+pybind11_add_module(${python_module_name} MODULE ${_module_src})
+add_sycl_to_target(TARGET ${python_module_name} SOURCES ${_module_src})
+


Suggested change

# Ensure Cython modules build first so _usmarray.h exists

add_dependencies(${python_module_name} _usmarray)

antonwolfy · 2026-05-11T22:55:09Z

+// dpctl namespace alias for type dispatch utilities
+namespace dpctl_td_ns = dpnp::tensor::type_dispatch;


Suggested change

// dpctl namespace alias for type dispatch utilities

namespace dpctl_td_ns = dpnp::tensor::type_dispatch;

// namespace for operations with types

namespace dpnp_td_ns = dpnp::tensor::type_dispatch;

antonwolfy · 2026-05-24T10:11:23Z

 * Added implementation of `dpnp.scipy.linalg.lu` (SciPy-compatible) [#2787](https://github.com/IntelPython/dpnp/pull/2787)
 * Added support for ndarray subclassing via `dpnp.ndarray.view` method with `type` parameter [#2815](https://github.com/IntelPython/dpnp/pull/2815)
 * Migrated tensor implementation from `dpctl.tensor` into `dpnp.tensor`, making `dpnp` the primary owner of the Array API-compliant tensor layer [#2856](https://github.com/IntelPython/dpnp/pull/2856)
+* Added implementation of `dpnp.scipy.sparse.linalg import LinearOperator, cg, gmres, minres` [#2841](https://github.com/IntelPython/dpnp/pull/2841)


need to move to 0.21.0 release

antonwolfy · 2026-05-24T10:42:01Z

+        # Lazy import: dpnp.scipy.sparse may import this module during
+        # package initialisation, so we cannot import it at module scope.
+        # pylint: disable=import-outside-toplevel
+        from dpnp.scipy.sparse import issparse


It seems issparse is missed and not implemented currently

Thanks for taking a peek, I cleaned it up

antonwolfy · 2026-05-24T10:59:13Z

+# Helpers for constructing SPD, diagonally dominant, and symmetric
+# indefinite test matrices. Kept small and local, matching the style of
+# vvsort() at the top of test_linalg.py.
+def _spd_matrix(n, dtype):


Can we add CuPy tests to dpnp/tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py to check the compatibly?

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>

…agusetty/dpnp into feature-sparse-linalg-solvers

Closes the performance gap where csr_matrix.dot densified the matrix via toarray() before every dpnp.dot, defeating the point of CSR outside the solver fast path. cupyx routes csr_matrix.dot directly through cuSPARSE SpMV; we now do the equivalent with oneMKL. _csr.py - Add _ensure_spmv_handle: lazily allocates a oneMKL matrix_handle (set_csr_data + optimize_gemv, once) and caches it on the instance. Skipped when the compiled backend extension is absent or the (value, index) dtype pair is not in the dispatch table -- the instance silently uses the dense fallback in that case. - Add _spmv_supported predicate; tighter than the old check in _make_fast_matvec because it now validates both value AND index dtype against the C++ dispatch table, preventing a runtime ValueError from the backend for unsupported index types. - Rewrite dot(x) to dispatch sparse::gemv for 1-D x of matching dtype; preserve the dense fallback for 2-D x (batched SpMM is a separate oneMKL entry point not wired up yet). - Add __del__ to release the cached handle. The handle attrs are initialised at the very top of __init__ so __del__ never sees a partially constructed object even if an _init_* helper raises. - Add __matmul__ = dot so 'A @ x' takes the same fast path. linalg/_iterative.py - _CachedSpMVPair.matvec now drives the csr's own cached handle instead of building a second forward handle, so a user-issued A.dot(x) and a solver-issued matvec(x) share a single handle. The adjoint handle is still owned by _CachedSpMV because csr only caches the forward direction. - _make_fast_matvec collapses to a probe of A._ensure_spmv_handle: one None-check replaces three (extension import, dtype guard, handle init try/except). The downstream behaviour is identical but the dtype check now covers indices, not just values. linalg/_interface.py - Rewrite aslinearoperator in the canonical SciPy/CuPy dispatch order: LinearOperator -> sparse -> dense ndarray -> duck-type. Remove the unreachable second isinstance(dpnp.ndarray) block and the conflicting ndim policy (was both <=2 and ==2 in different branches). Promote 1-D dpnp arrays via atleast_2d to match CuPy. - Explicitly reject numpy.ndarray with a directed message instead of falling through to the generic 'type not understood'. Silent host->device upload would mask real device/queue selection bugs. - Drop the try/except (ImportError, AttributeError) wrapping the sparse import. dpnp.scipy.sparse is always present in this build; swallowing those exceptions hid real failures. _base.py - Docstring only: document why the legacy isspmatrix / isspmatrix_csr aliases are intentionally omitted (dpnp has no spmatrix vs sparray discriminator, and modern SciPy/CuPy code is steered toward issparse + A.format == 'csr').

…; add cupyx tests Five bug fixes informed by a full audit of LinearOperator, cg, gmres, and minres against the upstream SciPy and cupyx reference implementations, plus the cupyx-style compatibility tests requested in the PR review. _interface.py - _init_dtype: replace the float64 trial vector with int8. The float64 path silently upcast every dtype-inferred operator to float64, breaking float32 and complex64 workflows and contradicting the commit-message claim of an earlier fix that did not actually land. Matches scipy/cupyx semantics: int8 is the lowest-precedence numeric dtype and lets the matvec promote to its natural output dtype. - matvec / rmatvec: turn the second line of the dimension-mismatch error message into an f-string. Previously reported the literal text '{x.shape}' to users. _iterative.py (cg) - Fix the info contract: previously returned -1 for an rz/pAp breakdown. SciPy and cupyx reserve info < 0 for illegal-input errors only and use info > 0 for any non-convergence (maxiter exhausted, breakdown, etc.). User code that branched on 'if info > 0' was silently broken. Now report the iteration index at which the solver gave up. - Document the info contract in the docstring so the regression cannot resurface unnoticed. _iterative.py (gmres) - Cast b_norm to host once via float(...) instead of leaving it as a 0-D device tensor. The 'if b_norm == 0.0' guard and the per-iter 'if r_norm <= atol' comparison previously triggered an implicit __bool__ sync each time they evaluated -- exactly the sync pattern the module docstring claims to avoid. Track r_norm_host explicitly so the loop condition and the post-loop info check operate on host scalars; the residual norm is still computed device-side via dpnp.linalg.norm. - Final 'not bool(r_norm <= atol)' replaced with the equivalent host comparison 'r_norm_host > atol'. _iterative.py (minres) - Cast beta1 = float(dpnp.inner(r1, y)) before the < 0 and == 0 guards. Previously the guards compared a 0-D device tensor to a Python int, forcing an implicit sync inside each branch. - Use numpy.sqrt on the already-host beta1 instead of dpnp.sqrt + float() round-trip. tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py - New file (with sibling __init__.py for the package) requested in the PR review thread. Modelled on cupy/tests/cupyx_tests/scipy_tests/sparse_tests/test_linalg.py and fitted to dpnp's in-tree testing harness (dpnp as cupy, the dpnp.tests.third_party.cupy.testing helpers, scipy as the reference via testing.with_requires('scipy')). - Coverage: * LinearOperator: explicit-dtype preservation, int8 dtype inference, dimension-mismatch raise, __matmul__ dispatch, H property. * aslinearoperator: pass-through, dense dpnp.ndarray wrap, csr_matrix wrap, explicit numpy.ndarray rejection. * cg: dense-SPD convergence vs scipy, x0 warm-start, info > 0 when maxiter is hit (guards the breakdown-contract fix above), zero-rhs returns zero. * gmres: diag-dominant convergence vs scipy, restart parameter, info > 0 when maxiter is hit. * minres: symmetric-indefinite convergence vs scipy, shift parameter, zero-rhs returns zero. * Module surface: import smoke test for the five PR-promised names (LinearOperator, aslinearoperator, cg, gmres, minres).

…ost Hessenberg lstsq Closes audit items IntelPython#9, IntelPython#15, IntelPython#16, IntelPython#17, IntelPython#19 from the prior solver review. All four touch the iterative-solver inner loop and were left out of the previous correctness-only commit because they require a new C++ binding. backend/extensions/blas/gemv.{cpp,hpp}, blas_py.cpp - Extend the typed gemv_impl signature with alpha and beta as doubles; the per-T impl casts them to the matrix value type at dispatch time. dpnp.dot and other legacy callers are unaffected -- the existing gemv() public function now forwards (1.0, 0.0) to the shared gemv_dispatch helper. - Add gemv_alpha_beta() entry point and a _gemv_alpha_beta pybind method computing y = alpha * op(A) * x + beta * y for caller- supplied scalars. Required by the GMRES Arnoldi fast path which fires gemv with (alpha=1, beta=0) writing into a Hessenberg column slice, then (alpha=-1, beta=1) fusing u -= V @ h into one kernel. For complex matrices the scalars are always one of {-1, 0, 1} and so survive the cast exactly; the impl docstring flags the silent imag-loss caveat for any other complex caller. - Refactor: hoist the existing validation / strides / dispatch plumbing into a static gemv_dispatch helper so both entry points share identical behaviour without duplication. scipy/sparse/linalg/_iterative.py - _make_compute_hu now takes both V and H. The closure writes the projection coefficients h = V[:, :j+1]^H @ u directly into the Hessenberg column slice H[:j+1, j] via a single gemv with the output pointing at that slice -- no intermediate h buffer, no slice-assign copy (audit item IntelPython#16). Pass 2 fuses the AXPY-style update u = -V @ h + 1 * u into one gemv with alpha=-1, beta=1 -- no tmp buffer, one kernel instead of gemv-plus-subtract (audit item IntelPython#19). For complex V the (j+1)-element h slice is conjugated in-place between the two passes (V^T -> V^H), negligible cost next to the n*(j+1) gemv. - Switch the Hessenberg least-squares H y = e from a device dpnp.linalg.lstsq (which dispatches an SVD kernel for a tiny 21x20 problem per restart) to numpy.linalg.lstsq on the host (audit item IntelPython#17). Matches CuPy's choice and removes a device- side SVD launch that on Intel GPU dominates the restart cost for the default restart=20. RHS e is now maintained as a numpy array; H is copied via dpnp.asnumpy once per restart and the resulting y is shipped back as a dpnp array for the V @ y update. - V[:, j+1] = v retained as a single contiguous USM slice store (audit item IntelPython#15 closes as no-change-required: the assignment is already one dpnp op on an F-order buffer and there is no fused 'normalise-then-store' path without further binding work). - cg per-iter syncs collapsed from 3-4 down to 1 (audit item IntelPython#9). The pAp and rz_new breakdown checks are no longer transferred to the host on every iteration; instead the loop relies on IEEE-754 inf / NaN propagation through alpha = rz / pAp. When pAp underflows the resulting alpha is inf or NaN, poisons the next residual via x += alpha * p and r -= alpha * Ap, and the single norm sync at the top of the next iteration detects the breakdown via numpy.isfinite(rnorm_host) and exits with info > 0. Mirrors the cuBLAS-style CG inner loop (nrm2 + scalar test, one host barrier per iter); the initial rz breakdown guard remains so a zero preconditioned residual still short- circuits correctly. tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py - test_gmres_complex_arnoldi_fast_path: complex-dtype regression guard for the conjugate-in-place branch of _make_compute_hu -- a silent miss of the conjugate would lose orthogonality and misconverge. - test_cg_inf_breakdown_returns_positive_info: regression guard for the per-iter-sync collapse. Runs cg on a deliberately singular SPD operator and asserts info > 0 (not zero, not -1) so a future re-introduction of explicit breakdown syncs would still pass but a regression to the old info contract would not.

… __del__ against shutdown races Closes audit items IntelPython#5 and IntelPython#29 from the prior solver review. Item IntelPython#4 (_matmat default uses a per-column matvec loop) is closed as wontfix: SciPy's scipy.sparse.linalg.LinearOperator and cupyx's analogue both ship the same hstack-of-matvecs default, so dpnp matches the reference exactly and there is no portable improvement to make without subclass-level _matmat overrides (which _CustomLinearOperator already exposes via its matmat= constructor argument). scipy/sparse/linalg/_interface.py - Set __array_ufunc__ = None on the LinearOperator base class. This is the SciPy contract: a host numpy.ndarray on the left of np_array * linop or np_array @ linop previously triggered NumPy's ufunc dispatch first, which would attempt to broadcast the operator element-wise before falling back to its reflected operator method -- producing either an opaque error or a wrong- typed result. With __array_ufunc__ = None NumPy returns NotImplemented from the ufunc protocol and Python's operator dispatch falls through cleanly to LinearOperator.__rmul__ / __rmatmul__. dpnp.ndarray itself sets __array_ufunc__ = None (see dpnp/dpnp_array.py:222) for the same reason, so the two dispatch systems now agree. scipy/sparse/_csr.py, scipy/sparse/linalg/_iterative.py - Harden __del__ in csr_matrix and in _CachedSpMV against the interpreter-shutdown race where the compiled _sparse_impl extension is garbage-collected before the matrix instance whose oneMKL handle it owns. Previous code used a single except Exception: pass which silenced two qualitatively different failure modes: 1. shutdown race -- extension gone, si._sparse_gemv_release evaluates to None or AttributeError; the handle is unrecoverable and leaving the OS to reclaim it at process exit is the only sane option; 2. genuine backend error while the interpreter is healthy -- a real bug we want to surface eventually, but raising from __del__ produces only an 'Exception ignored in:' warning and the handle is gone either way. The new code probes getattr(si, '_sparse_gemv_release', None) explicitly so case (1) takes the fast non-call path, and then splits the except into (AttributeError, TypeError) for case (1)- style residuals (queue / handle attribute access racing the shutdown) versus a final broad except for case (2). Both still return silently from __del__ -- raising is never valid here -- but the intent is now documented and a real backend regression is no longer indistinguishable from the GC race in code review. tests/third_party/cupyx/scipy_tests/sparse_tests/test_linalg.py - test_array_ufunc_opt_out: asserts the __array_ufunc__ = None marker is present on LinearOperator. Mirrors SciPy's own test suite test_interface.py::test_array_ufunc_opt_out. - test_numpy_scalar_times_linop_dispatches_to_rmul: the concrete runtime consequence -- numpy.float64(2.0) * linop must produce a scaled LinearOperator, not raise or yield an array.

Two GMRES tests asked scipy for rtol=1e-8 (real) and rtol=1e-7 (complex). For float32 / complex64 the accumulated rounding error in classical Gram-Schmidt over n Arnoldi steps is already O(eps * sqrt(n)) ~ 4e-7, so even the scipy reference cannot reach those tolerances on a well-conditioned 10x10 / 12x12 system. The failure surfaces on the very first assertion, info_ref == 0, making it look like a dpnp regression when in fact it is a test- design bug in our suite: the relative residual floors out at the dtype's representable precision. Confirmed with a host-side scipy.sparse.linalg.gmres reproduction: dt=<f4 n=10 rtol=1e-08 info=100 rel_resid=1.25e-08 (FAIL) dt=<f4 n=10 rtol=1e-05 info= 0 rel_resid=1.25e-06 (PASS) dt=<c8 n=12 rtol=1e-07 info=120 (FAIL on user's host; rng-sensitive) dt=<c8 n=12 rtol=1e-05 info= 0 rel_resid=2.55e-06 (PASS) dt=<f8 n=10 rtol=1e-08 info= 0 (PASS unchanged) dt=<c16 n=12 rtol=1e-07 info= 0 (PASS unchanged) Patch: derive rtol per dtype -- 1e-5 for the single-precision variants (float32, complex64), 1e-8 / 1e-7 for double precision. The assert_allclose tolerances against the scipy reference also need to be widened for the single-precision branches (5e-4 / 5e-5) for the same noise-floor reason; tightening them would just relocate the false failure from info==maxiter to assert_allclose. double- precision comparisons stay at the original 1e-4 / 1e-5. No solver code changes: both tests still serve their regression- guard purpose. The complex test in particular still exercises the V^T -> V^H conjugate-in-place branch of _make_compute_hu, and the real test still drives the full GMRES restart loop end-to-end.

…ath bug The complex GMRES Arnoldi step had a math bug since the alpha/beta refactor in commit 50e9df4. Pass 1 issued gemv(transpose=T) to get V^T @ u, then post-conjugated h to recover V^H @ u. The identity that conjugation appeals to: conj(V^T @ u) == V^H @ u holds ONLY when u is real-valued. For complex u the post-conjugate actually produces: conj(V^T @ u) == V^H @ conj(u) which is a different vector. Classical Gram-Schmidt loses orthogonality, the Krylov basis collapses, and GMRES never reaches the requested rtol. The bug was caught by the complex64 case of test_gmres_complex_arnoldi_fast_path (info=120, residual ~36 instead of converging). complex128 happened to mask the failure on some configurations because for_dtypes iterates dtypes inside a single test method and the complex64 assertion fires first, never giving complex128 a chance to run. Numpy-level reproduction confirms both single- and double-precision diverge identically with the buggy math. Fix: extend bi._gemv_alpha_beta to accept the full {N, T, C} tri-state of oneMKL transpose modes, so the GMRES inner step can request V^H @ u directly in one kernel rather than emulating it. backend/extensions/blas/gemv.{hpp,cpp}, blas_py.cpp - Replace the transpose: bool parameter on gemv_alpha_beta with trans_op: int (0=N, 1=T, 2=C). The legacy gemv() wrapper, used by dpnp.dot and other non-Hermitian call sites, keeps its bool signature and maps it to {0, 1} internally so no existing caller is affected. - Pass trans_op through gemv_dispatch into the oneMKL gemv call. Conjugate-transpose is restricted to F-contiguous matrices: the row-major remap (treating a C-contig matrix as its column- major transpose) does not extend cleanly to the C op because (A^T)^H == conj(A), which oneMKL does not expose as a gemv mode. Callers needing C on row-major input must F-contigify first; an explicit ValueError documents the constraint. - The cuBLAS-only branch handles trans_op=C the same way: only via the F-contig path. - Update the pybind binding name and docstring accordingly. scipy/sparse/linalg/_iterative.py - _make_compute_hu picks pass1_trans_op = 2 (C) for complex matrices and 1 (T) for real matrices. The closure body becomes a clean two-gemv sequence with no scratch buffer, no in-place conjugate, and no special-case branching. The Krylov basis is F-contiguous by construction (allocated with order='F' at the top of gmres()), so the F-contig restriction on trans_op=C is automatically satisfied. - Drop the dpnp.tensor._tensor_elementwise_impl import that an intermediate WIP had pulled in for tei._conj; it's no longer needed now that the conjugate happens inside oneMKL. - Update the docstring to explain why the post-hoc conjugate approach is wrong and why C-mode is the correct fix, so the next reader does not re-introduce the bug.

Audit-driven follow-up to the LinearOperator review. _iterative.py already follows dpnp's strict-coercion contract (inputs must be dpnp arrays or true scalars; never numpy arrays; transfers are explicit). _interface.py was written from the SciPy reference, where host scalars are routine, so a few implicit-transfer paths slipped in. This commit tightens four of them. Fix 1 (HIGH) -- LinearOperator.dot() Previously called dpnp.asarray(x) on any non-scalar, non- LinearOperator argument. That silently host-to-device-copied a numpy.ndarray on every call, masking real device / queue selection bugs in caller code. Now rejects numpy.ndarray with a directed TypeError telling the user to convert via dpnp.asarray() explicitly; rejects any other non-dpnp non-scalar non-LinearOperator with a generic 'type not understood' error. This changes user-visible behaviour: linop @ numpy_array, which used to silently upload, now raises. The companion test test_dot_accepts_dpnp_array_after_explicit_transfer pins the documented workaround. Fix 2 (MEDIUM) -- _ScaledLinearOperator.__init__ dtype inference Previously passed type(alpha) (a Python class object) to _get_dtype, which delegated to dpnp.result_type. For Python scalars this collapsed every float to float64, silently widening a float32 operator's matvec results to float64 on the device whenever scaled by 2.0 etc. Now uses numpy.array(alpha).dtype (a pure-host op that returns the smallest dtype holding alpha) so float32 stays float32. Regression guard: test_scaled_operator_preserves_float32_dtype. Fix 3 (MEDIUM) -- _ScaledLinearOperator conjugates via numpy not dpnp _rmatvec, _rmatmat, _adjoint did dpnp.conj(self.args[1]) on a host scalar (alpha). Depending on dpnp version, dpnp.conj on a scalar either returns a host scalar (lucky) or promotes to a 0-D dpnp array (one-element device upload per matvec). numpy.conj guarantees the host result for host scalars. The follow-up multiplication enters dpnp's __rmul__ chain identically to the unconjugated _matvec/_matmat paths, so device dispatch is unchanged. Fix 7 (cosmetic) -- aslinearoperator drops dead try/except The numpy import there was wrapped in try/except ImportError as a defensive maneuver, but numpy is a hard dpnp dependency (imported unconditionally at module top), so the exception path was dead code that obscured intent. Replaced with a plain isinstance(A, numpy.ndarray) check. Fix 4 (doc only) -- _PowerLinearOperator int(p) hazard comment int(p) inside _PowerLinearOperator.__init__ would force a device sync if p were a dpnp 0-D array; in practice the entry guard in LinearOperator.__pow__ (dpnp.isscalar(p), which is False for any 0-D dpnp array) prevents this. Added a comment so a future caller-side change does not re-introduce the hazard.

…pe on host scalars

…nfo/dtype/lstsq

…agusetty/dpnp into feature-sparse-linalg-solvers

abagusetty and others added 30 commits April 2, 2026 12:52

Add sparse.linalg iterative solvers

a7c8d31

Add sparse.linalg iterative solvers

e3e9052

Add sparse.linalg iterative solvers

c0e9fb5

Add sparse.linalg iterative solvers

943c52f

Add sparse.linalg iterative solvers

1191f7e

Fix deprecated tol kwarg in SciPy host fallback

2384185

Add tests for scipy.sparse.linalg: LinearOperator, cg, gmres, minres

58cc44b

Fix implicit numpy conversion; use .asnumpy() for dpnp arrays

3a50062

add onemkl sparse gemv pybind logic

8c68d98

sparse: add pybind11 module, CMakeLists, and hook _sparse_impl into _…

0c4a888

…iterative.py

Remove redundant sparse_gemv.hpp passthrough header

4993120

sparse: gemv.cpp includes gemv.hpp directly

a7ddc1c

sparse: capture exec_q from CSR data at closure construction

14cb5c4

minor cleanup for sparse extensions

ed58333

Fix SyntaxError: remove stray backslash in aslinearoperator hasattr s…

0a32b57

…tring Line 477: `hasattr(A, "rmatmat\")` had a Markdown-escaped backslash leaked into the Python source, causing an unterminated string literal. Fixed to `hasattr(A, "rmatmat")`.

sparse/linalg: pure-GPU CG/GMRES/MINRES, drop all CPU fallback paths,…

4292518

… port SciPy corner cases

Fix dtype.char AttributeError on dpnp dtype objects in CG/GMRES/MINRES

125dab5

Fix M guard, MINRES betacheck decay and gbar init in Paige-Saunders r…

2d753cf

…ecurrence

abagusetty and others added 2 commits May 20, 2026 19:37

Fix pylint failures

a375760

Merge branch 'master' into feature-sparse-linalg-solvers

36042c6

antonwolfy reviewed May 24, 2026

View reviewed changes

abagusetty and others added 27 commits May 26, 2026 10:29

Update dpnp/backend/extensions/sparse/CMakeLists.txt

902eeff

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>

Update dpnp/backend/extensions/sparse/gemv.cpp

2b2ac0d

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>

Update dpnp/backend/extensions/sparse/gemv.cpp

04ad7be

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>

Update dpnp/backend/extensions/sparse/types_matrix.hpp

d77e516

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>

Update dpnp/backend/extensions/sparse/gemv.cpp

d9734d8

Co-authored-by: Anton <100830759+antonwolfy@users.noreply.github.com>

Merge branch 'IntelPython:master' into feature-sparse-linalg-solvers

8cb5656

Merge branch 'IntelPython:master' into feature-sparse-linalg-solvers

efe896c

add left out files

c3aae1d

Merge branch 'feature-sparse-linalg-solvers' of https://github.com/ab…

ec72540

…agusetty/dpnp into feature-sparse-linalg-solvers

_interface: drop module-level numpy import; use .conjugate() and .dty…

16f7450

…pe on host scalars

_interface: drop redundant numpy mentions from comments

f910092

_iterative: use math.* for host scalar arithmetic; numpy stays for fi…

cfabd12

…nfo/dtype/lstsq

_iterative: drop dead stores and xtype alias

6b2b645

sparse: address pylint findings from pre-commit

7f49318

sparse: apply isort fixes

70e3c30

Fix formatting

ec1797a

test_linalg: align style with sibling cupyx tests

7b05eef

Merge branch 'feature-sparse-linalg-solvers' of https://github.com/ab…

2b43800

…agusetty/dpnp into feature-sparse-linalg-solvers

Merge branch 'master' into feature-sparse-linalg-solvers

0182d6a

Move msg from v20 to v21

04e4876

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature sparse linalg solvers#2841

Feature sparse linalg solvers#2841
abagusetty wants to merge 93 commits into
IntelPython:masterfrom
abagusetty:feature-sparse-linalg-solvers

abagusetty commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

antonwolfy May 11, 2026

Uh oh!

Uh oh!

Uh oh!

antonwolfy May 11, 2026

Uh oh!

Uh oh!

Uh oh!

antonwolfy May 24, 2026

Uh oh!

antonwolfy May 24, 2026

Uh oh!

abagusetty May 29, 2026

Uh oh!

antonwolfy May 24, 2026

Uh oh!

abagusetty May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		pybind11_add_module(${python_module_name} MODULE ${_module_src})
		add_sycl_to_target(TARGET ${python_module_name} SOURCES ${_module_src})



	# Ensure Cython modules build first so _usmarray.h exists
	add_dependencies(${python_module_name} _usmarray)

		// dpctl namespace alias for type dispatch utilities
		namespace dpctl_td_ns = dpnp::tensor::type_dispatch;

Conversation

abagusetty commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

antonwolfy May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

antonwolfy May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

antonwolfy May 24, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy May 24, 2026

Choose a reason for hiding this comment

Uh oh!

abagusetty May 29, 2026

Choose a reason for hiding this comment

Uh oh!

antonwolfy May 24, 2026

Choose a reason for hiding this comment

Uh oh!

abagusetty May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

abagusetty commented Apr 9, 2026 •

edited

Loading